Skip to content

feat(arrow): Use field name for lookup when field_id in parquet is unavailable #1566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

fvaleye
Copy link
Contributor

@fvaleye fvaleye commented Jul 30, 2025

Which issue does this PR close?

What changes are included in this PR?

This PR improves compatibility with Arrow data sources like DataFusion that may not provide the PARQUET:field_id metadata in their schemas.
When field_id is unavailable, this change introduces a fallback mechanism to look up fields by name. This makes the integration more robust and prevents lookup failures.

The key changes include:
Field Lookup Fallback: The core logic now uses the field name for column lookups when field_id is missing.
Enhanced Error Messages: The error message for a failed lookup is now more descriptive, making it easier to debug schema issues.

Are these changes tested?

Yes. A new unit test has been added to specifically validate the name-based fallback logic, ensuring this new behavior is covered.

@CTTY
Copy link
Contributor

CTTY commented Jul 30, 2025

Is this a duplicate of PR?

@fvaleye
Copy link
Contributor Author

fvaleye commented Jul 31, 2025

Is this a duplicate of PR?

Oh, I actually missed this PR. I started working on this after reading the issue to get familiar with the codebase. Would you like me to close this one?

…ata is unavailable

When reading Arrow data from sources that don't provide the PARQUET:field_id metadata (like DataFusion), the column lookup failed.
This change introduces a fallback mechanism to look up fields by name if the field ID is not present in the Arrow field metadata. This improves compatibility with various Arrow data sources.

The commit also includes:
- A new unit test to verify the name-based fallback logic.
- A more detailed error message when a field can't be found.
@fvaleye fvaleye force-pushed the feat/arrow-field-lookup-fallback branch from feb2918 to 6f614f7 Compare July 31, 2025 07:14
@fvaleye
Copy link
Contributor Author

fvaleye commented Jul 31, 2025

Closing this as a duplicate of PR
Thanks for pointing that out @CTTY!

@fvaleye fvaleye closed this Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to use field.name to determine arrow field's position when PARQUET:field_id is unavailable
2 participants